Modeling Human Sentence Processing Data with a Statistical Parts-of-Speech Tagger
نویسنده
چکیده
It has previously been assumed in the psycholinguistic literature that finite-state models of language are crucially limited in their explanatory power by the locality of the probability distribution and the narrow scope of information used by the model. We show that a simple computational model (a bigram part-of-speech tagger based on the design used by Corley and Crocker (2000)) makes correct predictions on processing difficulty observed in a wide range of empirical sentence processing data. We use two modes of evaluation: one that relies on comparison with a control sentence, paralleling practice in human studies; another that measures probability drop in the disambiguating region of the sentence. Both are surprisingly good indicators of the processing difficulty of garden-path sentences. The sentences tested are drawn from published sources and systematically explore five different types of ambiguity: previous studies have been narrower in scope and smaller in scale. We do not deny the limitations of finite-state models, but argue that our results show that their usefulness has been underestimated.
منابع مشابه
Studying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملبررسی مقایسهای تأثیر برچسبزنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی
In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...
متن کاملEnhancing the Performance of Part of Speech tagging of Nepali language through Hybrid approach
Part-of-speech tagging is the process of marking up the words in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context —i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph. Part-of-Speech (POS) tagging is the process of assigning the appropriate part of speech or lexical category to each word in a ...
متن کاملQuantum Neural Network based Parts of Speech Tagger for Hindi
The parts of speech disambiguation in corpora is most challenging area in Natural Language Processing. However, someworkshave been done in the past to overcome the problem of bilingual corpora disambiguation forHindi using Hidden Markov Model and Neural Network. In this paper,Quantum Neural Network (QNN) forHindi parts of speech tagger has been used.To analyze the effectiveness of the proposed ...
متن کاملبرچسبگذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی
Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...
متن کامل